survey article
S'MoRE: Structural Mixture of Residual Experts for Parameter-Efficient LLMFine-tuning
Fine-tuning pre-trained large language models (LLMs) presents a dual challenge of balancing parameter efficiency and model capacity. Existing methods like low-rank adaptations (LoRA) are efficient but lack flexibility, while Mixture-of-Experts (MoE) enhance model capacity at the cost of more & under-utilized parameters. To address these limitations, we propose Structural Mixture of Residual Experts (S'MoRE), a novel framework that seamlessly integrates the efficiency of LoRA with the flexibility of MoE. Conceptually, S'MoRE employs hierarchical low-rank decomposition of expert weights, yielding residuals of varying orders interconnected in a multi-layer structure.
Stop DDoS Attacking the Research Community with AI-Generated Survey Papers
Survey papers are foundational to the scholarly progress of research communities, offering structured overviews that guide both novices and experts across disciplines. However, the recent surge of AI-generated surveys, especially enabled by large language models (LLMs), has transformed this traditionally labor-intensive genre into a low-effort, high-volume output. While such automation lowers entry barriers, it also introduces a critical threat: the phenomenon we term the "survey paper DDoS attack" to the research community. This refers to the unchecked proliferation of superficially comprehensive but often redundant, low-quality, or even hallucinated survey manuscripts, which floods preprint platforms, overwhelms researchers, and erodes trust in the scientific record. In this position paper, we argue that we must stop uploading massive amounts of AI-generated survey papers (i.e., survey paper DDoS attack) to the research community, by instituting strong norms for AI-assisted review writing. We call for restoring expert oversight and transparency in AI usage and, moreover, developing new infrastructures such as Dynamic Live Surveys, community-maintained, version-controlled repositories that blend automated updates with human curation. Through quantitative trend analysis, quality audits, and cultural impact discussion, we show that safeguarding the integrity of surveys is no longer optional but imperative to the research community.
Energy: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OODGeneralization
Recent approaches for vision-language models (VLMs) have shown remarkable success in achieving fast downstream adaptation. When applied to real-world downstream tasks, VLMs inevitably encounter both the in-distribution (ID) data and out-of-distribution (OOD) data. The OOD datasets often include both covariate shifts (e.g., known classes with changes in image styles) and semantic shifts (e.g., test-time unseen classes). This highlights the importance of improving VLMs' generalization ability to covariate-shifted OOD data, while effectively detecting open-set semantic-shifted OOD classes. In this paper, inspired by the substantial energy change observed in closed-set data when re-aligning vision-language modalities--specifically by directly reducing the maximum cosine similarity to a low value--we introduce a novel OOD score, named Energy.
OASIS: One-Shot Federated Graph Learning via Wasserstein Assisted Knowledge Integration
Federated Graph Learning (FGL) offers a promising framework for collaboratively training Graph Neural Networks (GNNs) while preserving data privacy. In resourceconstrained environments, One-shot Federated Learning (OFL) emerges as an effective solution by limiting communication to a single round. Current OFL approaches employing generative models have attracted considerable attention; however, they face unresolved challenges: these methods are primarily designed for traditional image data and fail to capture the fine-grained structural information of local graph data. Consequently, they struggle to integrate the intricate correlations necessary and transfer subtle structural insights from each client to the global model. To address these issues, we introduce OASIS, an innovative one-shot FGL framework. In OASIS, we propose a Synergy Graph Synthesizer designed to generate informative synthetic graphs and introduce a Topological Codebook to construct a structural latent space. Moreover, we propose the WassersteinEnhanced Semantic Affinity Distillation (WESAD) to incorporate rich inter-class relationships and the Wasserstein-Driven Structural Relation Distillation (WDSRD) to facilitate the effective transfer of structural knowledge from the Topological Codebook. Extensive experiments on real-world tasks demonstrate the superior performance and generalization capability of OASIS, with an average improvement of 15.81% over the baseline.
Fit the Distribution: Cross-Image/Prompt Adversarial Attacks on Multimodal Large Language Models
Although Multimodal Large Language Models (MLLMs) have demonstrated remarkable achievements in recent years, they remain vulnerable to adversarial examples that result in harmful responses. Existing attacks typically focus on optimizing adversarial perturbations for a certain multimodal image-prompt pair or fixed training dataset, which often leads to overfitting. Consequently, these perturbations fail to remain malicious once transferred to attack unseen image-prompt pairs, suffering from significant resource costs to cover the diverse multimodal inputs in complicated real-world scenarios. To alleviate this issue, this paper proposes a novel adversarial attack on MLLMs based on distribution approximation theory, which models the potential image-prompt input distribution and adds the same distribution-fitting adversarial perturbation on multimodal input pairs to achieve effective cross-image/prompt transfer attacks. Specifically, we exploit the Laplace approximation to model the Gaussian distribution of the image and prompt inputs for the MLLM, deriving an estimate of the mean and covariance parameters. By sampling from this approximated distribution with Monte Carlo mechanism, we efficiently optimize and fit a single input-agnostic perturbation over diverse image-prompt pairs, yielding strong universality and transferability. Extensive experiments are conducted to verify the strong adversarial capabilities of our proposed attack against prevalent MLLMs spanning a spectrum of images/prompts.
AnaCP: Toward Upper-Bound Continual Learning via Analytic Contrastive Projection
This paper studies the problem of class-incremental learning (CIL), a core setting within continual learning where a model learns a sequence of tasks, each containing a distinct set of classes. Traditional CIL methods, which do not leverage pretrained models (PTMs), suffer from catastrophic forgetting (CF) due to the need to incrementally learn both feature representations and the classifier. The integration of PTMs into CIL has recently led to efficient approaches that treat the PTM as a fixed feature extractor combined with analytic classifiers, achieving state-ofthe-art performance. However, they still face a major limitation: the inability to continually adapt feature representations to best suit the CIL tasks, leading to suboptimal performance. To address this, we propose AnaCP (Analytic Contrastive Projection), a novel method that preserves the efficiency of analytic classifiers while enabling incremental feature adaptation without gradient-based training, thereby eliminating the CF caused by gradient updates. Our experiments show that AnaCP not only outperforms existing baselines but also achieves the accuracy level of joint training, which is regarded as the upper bound of CIL.
Probabilistic Stability Guarantees for Feature Attributions
Stability guarantees have emerged as a principled way to evaluate feature attributions, but existing certification methods rely on heavily smoothed classifiers and often produce conservative guarantees. To address these limitations, we introduce soft stability and propose a simple, model-agnostic, sample-efficient stability certification algorithm (SCA) that yields non-trivial and interpretable guarantees for any attribution method. Moreover, we show that mild smoothing achieves a more favorable trade-off between accuracy and stability, avoiding the aggressive compromises made in prior certification methods. To explain this behavior, we use Boolean function analysis to derive a novel characterization of stability under smoothing. We evaluate SCA on vision and language tasks and demonstrate the effectiveness of soft stability in measuring the robustness of explanation methods.
Multi-order Orchestrated Curriculum Distillation for Model-Heterogeneous Federated Graph Learning
Federated Graph Learning (FGL) has been shown to be particularly effective in enabling collaborative training of Graph Neural Networks (GNNs) in decentralized settings. Model-heterogeneous FGL further enhances practical applicability by accommodating client preferences for diverse model architectures. However, existing model-heterogeneous approaches primarily target Euclidean data and fail to account for a crucial aspect of graph-structured data: topological relationships. To address this limitation, we propose TRUST, a novel knowledge distillation-based modelheterogeneous FGL framework. Specifically, we propose Progressive Curriculum Node Scheduler to progressively introduce challenging nodes based on learning difficulty. In Adaptive Curriculum Distillation Modulator, we propose an adaptive temperature modulator that dynamically adjusts knowledge distillation temperature to accommodate varying client capabilities and graph complexity. Moreover, we leverage Wasserstein-Driven Affinity Distillation to enable models to capture crossclass structural relationships through optimal transport. Extensive experiments on multiple graph benchmarks and model-heterogeneous settings show that TRUST outperforms existing methods, achieving an average 3.6% performance gain, particularly under moderate heterogeneity conditions.
MOTION: Multi-Sculpt Evolutionary Coarsening for Federated Continual Graph Learning
Graph neural networks (GNNs) have achieved remarkable success in various domains but typically rely on centralized, static graphs, which limits their applicability in distributed, evolving environments. To address this limitation, we define the task of Federated Continual Graph Learning (FCGL), a paradigm for incremental learning on dynamic graphs distributed across decentralized clients. Existing methods, however, neither preserve graph topology during task transitions nor mitigate parameter conflicts in server-side aggregation. To overcome these challenges, we introduce MOTION, a generalizable FCGL framework that integrates two complementary modules: the Graph Topology-preserving Multi-Sculpt Coarsening (G-TMSC) module, which maintains the structural integrity of past graphs through a multi-expert, similarity-guided fusion process, and the Graph-Aware Evolving Parameter Adaptive Engine (G-EPAE) module, which refines global model updates by leveraging a topology-sensitive compatibility matrix. Extensive experiments on real-world datasets show that our approach improves average accuracy (AA) by an average of 30% over the FedAvg baseline across five datasets while maintaining a negative average forgetting (AF) rate, significantly enhancing generalization and robustness under FCGL settings.
Towards Visualization-of-Thought Jailbreak Attack against Large Visual Language Models
As Visual Language Models (VLMs) continue to evolve, they have demonstrated increasingly sophisticated logical reasoning capabilities and multimodal thought generation, opening doors to widespread applications. However, this advancement raises serious concerns about content security, particularly when these models process complex multimodal inputs requiring intricate reasoning. When faced with these safety challenges, the critical competition between logical reasoning and safety objectives of VLMs is often overlooked in previous works. In this paper, we introduce Visualization-of-Thought Attack (VoTA), a novel and automated attack framework that strategically constructs chains of images with risky visual thoughts to challenge victim models.